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Abstract 

The effect of homologous recombination (HR) on the evolution of microbial genomes remains contentious as competing 
hypotheses seek to explain the evolutionary dynamics of microbial species. Evidence for HR between microbial genomes is 
widespread, and this process has been proposed to act as a cohesive force that can constrain the diversification of microbial 
lineages. We seek to characterize the evolutionary dynamics of sympatric populations to explore the impact of HR on 
microbial speciation. We describe a simple equation for quantifying the cohesive effect of HR on microbial populations as 
a function of their nucleotide divergence, |i/p=7i g 10~ 207r 9. The model was verified using a forward-time microbial 
population simulator that can explore the evolutionary dynamics of sympatric populations in nonoverlapping niche space. 
The model was also evaluated using multilocus sequence data from a range of microbial species, providing criteria for 
dividing them into either cohesively recombining or clonally diverging lineages. We conclude that models of microbial 
diversification that appear contradictory can be explained in a unified manner as the natural and predictable consequence of 
variation in a small number of population parameters. 
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Introduction 

Widespread evidence of horizontal gene transfer (HGT) 
among bacteria and archaea has provoked ongoing debate 
as to whether genetic clusters in these organisms represent 
species or simply represent points along a continuum of ge- 
netic exchange. A variety of evolutionary forces have been 
invoked in these debates, but chief among them are periodic 
selection, genetic drift, and the effects of both homologous 
recombination (HR) and nonhomologous recombination 
(NR). Several different models have been proposed to ex- 
plain the evolution of microbial species (Fraser et al. 
2009). The ecotype model, for example, predicts that peri- 
odic selection and genetic drift act together to create iso- 
lated genetic clusters that represent ecologically distinct 
groups or "ecotypes" (Cohan 2002). The ecotype model ex- 
plains well the patterns of divergence observed in Bacillus 
spp. from Evolution Canyon, Israel, where distinct genetic 
clusters occur in ecologically distinct locations within the 
canyon (Cohan and Perry 2007). However, organisms like 
Neisseria meningitidis, Streptococcus pneumoniae, and 



Helicobacter pylori have high rates of gene exchange (Feil 
et al. 2000; Falush et al. 2001), and the application of 
the ecotype model to such recombining populations is com- 
plicated by the reshuffling of genes made possible by HR. 
Consequently, a variety of competing species models have 
been proposed and reviewed (Stackebrandt et al. 2002; 
Gevers et al. 2005; Konstantinidis et al. 2006; Nesbo 
et al. 2006; Whitaker 2006; Achtman and Wagner 2008; 
Fraser et al. 2009). 

Although HGT can act as a diversifying force when it in- 
volves the acquisition of new genes and traits, HGTcan also 
act as a cohesive force by mediating homologous replace- 
ment of existing genes (Fraser et al. 2007). HR involves the 
nonreciprocal unidirectional transfer of a homologous seg- 
ment of DNA from a donor to a recipient (analogous to in- 
terallelic gene conversion in eukaryotes). Because this form 
of HR is nonreciprocal, it decreases nucleotide divergence 
between donor and recipient, with the net change in sim- 
ilarity being a function of the sequence divergence prior to 
replacement. The HR rate depends on the similarity of the 
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donor and recipient in the stretch of DNA to be combined. 
This relationship likely depends upon local sequence similar- 
ity at the ends of recombination tracts referred to as mini- 
mum efficiently processed segment (MEPS) (Shen and 
Huang 1986; Hsieh et al. 1992; Rao and Radding 1995; 
Majewski and Cohan 1999b). The dependence of HR rate 
on local sequence similarity has been described as p x 
10~ 207 \ where n is the nucleotide divergence between 
the donor and the recipient stretch involved in recombina- 
tion, p is the rate at which recombination events are initi- 
ated, and -20 is the distance factor of recombination 
used to modify the success rate of recombination events 
based on the value of n (Fraser et al. 2007). 

We describe a simple equation that can be used to quan- 
tify the cohesive effect of HR on microbial populations and 
to make predictions about their evolutionary dynamics. Pre- 
dictions generated from this equation are verified with the 
use of a microbial population simulation. The simulation is 
used to test predictions from the equation, we describe and 
to determine whether the cohesive effects of HR are suffi- 
cient to prevent core genome divergence between sympat- 
ic populations. We seek to define the range of mutation 
and recombination rates that promote the divergence or 
convergence of sympatric populations, values of relevance 
to models of microbial speciation. We use the terms conver- 
gence and divergence to indicate a change in sequence sim- 
ilarity across the loci of the core genome. It is important to 
note that HR can serve as a mechanism for adaptation and 
differentiation (Levin and Cornejo 2009; Shapiro et al. 
2009), but because our framework is not designed to ac- 
count for the effects of selection and adaptation, these ef- 
fects are not considered. Although simplifying assumptions 
were made in both theory and simulation, the framework 
we present provides a simple null model that can be used 
to make predictions about the evolutionary dynamics of 
microbial populations and species. 

Materials and Methods 

In Results and Discussion, we describe equations that we de- 
veloped from theory to determine the impact of HR on the 
evolutionary dynamics of microbial populations. Predictions 
from these equations were tested using a new population 
simulator called bactsimDF (for bacterial s//r?ulation different 
and fixed) that we developed to model the evolution of mi- 
crobial populations forward in time. The bactsimDF program 
simulates two coexisting neutral populations that start at 
a user-specified level of interpopulation nucleotide diver- 
gence and population size. Individuals in each population 
have equal access to DNA from both populations without re- 
spect to population boundaries (at a rate modified by the level 
of nucleotide divergence). However, after each round of re- 
combination, the new generation for each population is sam- 
pled at random from within each population. This allows drift 



to act independently on each population and prevents a mul- 
tiple population simulation from turning into a single pop- 
ulation simulation with a recent common ancestor. In this 
way bactsimDF simulates ecologically isolated sympatric 
populations and approximates the conditions required for 
divergence via the ecotype model, in which an adaptive mu- 
tation permits a subpopulation to occupy a new ecological 
niche and to escape the effects of selection and drift in the 
ancestral population (Cohan 2002). 

The bactsimDF simulator was modified from the algo- 
rithm and code of the forward in time population simulator 
forwsim (Padhukasahasram et al. 2008). The algorithm used 
by forwsim to increase simulation speed is detailed else- 
where (Padhukasahasram et al. 2008), so we will only de- 
scribe it briefly here. The populations being modeled are 
neutral Wright-Fisher populations that are constant in size. 
Individual chromosomes are modeled as collections of poly- 
morphic sites, allowing new mutations to appear only at 
nonpolymorphic sites. Mutations are tracked by their loca- 
tion on the chromosome. Further information is not neces- 
sary, as the simulation works on a pseudoinfinite sites 
assumption. To improve the speed of the program, forwsim 
simulates the future genealogy of the population in the next 
eight generations and tracks all individuals still in the pop- 
ulation at that time and any individuals that have recom- 
bined with any of those present individuals in that span 
of time. Mutations or recombination events that occur be- 
tween individuals that do not contribute to future genera- 
tions are not simulated. The user may also specify the 
number of generations after which homogenous features 
of the population are purged, that is, if a mutation goes 
to fixation or extinction, it is no longer necessary to include 
that polymorphic site in every individual's chromosome and 
is removed. If a mutation is no longer carried in the popu- 
lation, then it is removed from the list of existing mutations, 
and new mutations may again enter the population at that 
site (i.e., that location becomes nonpolymorphic). 

The bactsimDF simulation program is modified substan- 
tially from forwsim in order to model HR (i.e., interallelic 
gene conversion) instead of crossing over. In addition, the 
mean tract length has been modified to be user specified 
with the tract length of individual events realized as a draw 
from a geometric distribution, as in the coalescent simula- 
tion ms (Hudson 2002). The probability of individual recom- 
bination events is determined as, R = p1 0~ dn , where p is the 
recombination rate, n is the average nucleotide divergence, 
and d is the distance factor of recombination. The distance 
factor of recombination (i.e., the degree to which recombi- 
nation rate declines with increasing sequence divergence) is 
the empirically defined value for the slope of the log-linear 
line that describes the relationship between /?, p, and n (as 
described in Fraser et al. 2007). For the purpose of modeling 
individual recombination events, the value of n is calculated 
between each individual donor and recipient. The program 
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calculates nucleotide divergence between populations every 
500 generations by comparing a user-specified number of 
individuals in both populations. User input includes effective 
population size, sample (output) size, genome length, total 
number of generations, the number of generations be- 
tween deletions of homogenous features, recombination 
rate, mutation rate, tract length, and the distance factor. 
The simulations we describe used the following parameters: 
effective population size, 2,000 (1,000 in each population); 
sample size, 200; genome length, 500,000; total genera- 
tions, 25,000,000; mutation rate, 5 x 10 -5 individual" 1 
generation -1 ; recombination rate, variable; time between 
deletions, 50 generations; tract length, 500; distance factor, 
-20. bactsimDF source code and precompiled binaries are 
freely available at https://sites.google.com/site/doroghazi/. 

Multilocus sequence data were used to estimate values of 
jo/p and % for a range of microbial species in order to evaluate 
these parameter estimates in the context of our model. Multi- 
locus sequence data and ClonalFrame output files from Vos 
and Didelot (2009) were graciously provided by Dr Michiel 
Vos. Nucleotide diversity was calculated with a Perl script. Val- 
ues for jj/p and n are provided in supplementary table S1 
(Supplementary Material online). 



Results and Discussion 

Results Produced from Theory 

We sought to characterize the impact of HR on microbial spe- 
ciation and we will first consider a pair of populations with 
equal mutation and recombination rates, with no barriers 
to recombination. In the absence of HR, we would expect 
the populations to diverge over time due to mutation. The 
effect of mutation in this case is straightforward, as mutations 
accumulate in each lineage at the mutation rate of individuals 
in the population (Kimura 1968), thus, we naturally expect 
that two isolated populations will diverge at twice their mu- 
tation rate or 2jll. Estimating the effect of HR on a population, 
however, is somewhat more complex. The HR rate for any 
given fragment of DNA is expected to vary as a log-linear 
function of the nucleotide divergence between donor and 
recipient. The effect of sequence similarity on HR rate has 
been quantified in Escherichia coli (Vulic et al. 1 997), S. pneu- 
moniae (Majewski et al. 2000), Bacillus subtilis, and Bacillus 
mojavensis (Zawadzki et al. 1995) as roughly R = p10 -2071 , 
where R is the modified recombination rate, p is the recom- 
bination rate when there are no differences between sequen- 
ces, n is the nucleotide divergence between sequences, and 
-20 is the distance factor of recombination (Fraser et al. 
2007). 

To quantify the cohesive effect of HR between two pop- 
ulations, we start by assuming that the two populations will 
converge due to HR at the rate 2p7i g 10~ 207 \ where p is 
the recombination rate and 7i g is the average nucleotide 



divergence between the populations (i.e., the average nu- 
cleotide divergence between any two sequences which 
are chosen at random from each of the two populations). 
In the simplest terms, this result is obtained by considering 
each HR event as a type of mutation that has the ability to 
reduce nucleotide divergence between populations (due to 
gene conversion). Unlike the rate of point mutation, how- 
ever, which is assumed to be constant, the HR rate is strongly 
influenced by the nucleotide similarity between donor and 
recipient (as described above). In addition, the number of 
nucleotide changes introduced by each HR event is a func- 
tion of the nucleotide divergence between the donor and 
the recipient for the DNA fragment being exchanged. 

Given the above, the forces of mutation and HR between 
two populations are equal when 2ja=2p7t g 1 0~ 207 \ and thus, 
we find that |i/p=7r g 10~ 207r g. This equation describes a line 
that represents the threshold of cohesive recombination be- 
tween any two populations (fig. 1 ). We can see that two pop- 
ulations will diverge when |i/p>7t g 10~ 20TC 9 and converge 
when |i/p<7r g 1 0" 207l 9 . Thus, pairs of populations with values 
of u/p and Tig that fall below the threshold will be cohesive, 
whereas populations above the threshold will be free to di- 
verge clonally through the random accumulation of muta- 
tions. With a simple rearrangement of the equation, we 
can calculate the per generation change in nucleotide diver- 
gence between the populations as A = 2\i - 2p7i g 10~ 207t g, 
and this rate is indicated by the heat map in figure 1 for 
a range of values u/p and n g (It is important to note that 
p is the per-site frequency of recombination, combining 
the rate of occurrence of recombination events and the tract 
lengths of those events.). Calculating the rate of change be- 
tween populations requires us to assume a mutation rate, 
and the heat map depicted figure 1 assumes a mutation rate 
of 1 x 10 -10 nucleotide -1 generation -1 . Lowering or raising 
the mutation rate will vary the rate of change between pop- 
ulations (i.e., the scale of the heat map) but not the direction 
of change (i.e., convergence or divergence) because the 
threshold of cohesive recombination is defined as a function 
of the ratio of mutation and recombination. 

Forward in Time Simulations 

We developed the population simulator bactsimDF to test 
the ability of the equation described above to model the 
evolutionary dynamics of populations as a function of their 
interpopulation nucleotide divergence (7u g ) and the ratio of 
mutation and recombination (\x/p). When evaluated across 
a wide range of parameter values, the results from simula- 
tions were in close agreement with predictions calculated 
directly from theory using the equation 2ju — 2p7i g 10~ 207r g 
(figs. 1 and 2). Simulations run with population parameters 
that fall above the cohesive threshold were observed to 
diverge, whereas those below the threshold converged 
(fig. 1). There was a single exception where a simulation 
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Fig. 1. — The ratio of mutation to HR is expressed as a function of 
mean interpopulation nucleotide divergence for sympatric populations 
that are free to recombine. The line (described by the function 
u/p=7i g 10" 207C 9) represents the predicted threshold that bounds 
populations subject to the cohesive effects of recombination. Pairs of 
populations below the curve are predicted to convergence due to HR, 
whereas populations above the curve are clonal and free to diverge. The 
heat map predicts the rate of change between two populations, 
calculated as A=2\i - 2p7i g 1 0~ 207r 9 , where A is the change in 
nucleotide divergence per generation. Each arrow indicates the results 
from a different bactsimDF simulation run with distinct initial values for 
u/p and Kg. Initial values for each simulation are indicated by the origin 
of each arrow. Arrow tips show the level of nucleotide divergence 
resulting after 25 million generations. 

initiated with parameters that fell below the curve resulted 
in net divergence (as indicated by the arrow below the curve 
which points to the right, fig. 1 , seen as the outlier in fig. 2). 
This outlier likely results from stochastic variation due to the 
proximity of the initial simulation parameters to an unstable 
equilibrium between the forces of recombination and 
mutation. 

For the purposes of testing the equation we describe, 
output from the simulation is evaluated only with respect 
to the overall genomic similarity between two populations. 
Output from the simulation, however, can also be used to 
analyze changes in smaller genomic regions. Such analysis 
could be especially useful in considering, for example, the 
fragmented speciation theory of Retchless and Lawrence 
(2007). The simulation can also be used for modeling allo- 
patric populations because geographical isolation would be 
represented in the simulation by using an interpopulation 
recombination rate of zero. 

The model we describe can be extended to evaluate recom- 
bination between populations that differ in effective popula- 
tion size by accounting for the change in interpopulation 
recombination rate due to the different probability of encoun- 
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Expected Changes in Nucleotide Divergence 

Fig. 2. — Observed changes in nucleotide divergence (as depicted 
in fig. 1) compared with expected changes calculated using recurrent 
application of the formula A =2\i - 2pn g 1 0~ 207r 9 . The line shown has an 
intercept of 0 and a slope of 1; identical observations and expectations 
should fall exactly on this line. The outlier, shown in red, is the one 
simulation that diverged and oscillated instead of converging. 



tering DNA from an individual from the other population (this 
relates to propinquity and the effect should also be influenced 
by differences in census population size). For equal sized pop- 
ulations, p^glO" 20 ^ + p 2 7ig10- 20K g=2p 1 7ig10- 207T g but for 
populations that differ in size the individual interspecies recom- 
bination rate must be calculated for each population. To sim- 
ulate coexisting populations of unequal population size, we 
modified bactsimDF to include populations of size 1800 and 
200 (fig. 3). When averaged across all individuals in both 
populations the recombination rate for populations of unequal 
size and for those of equal size was the same (5 x 10 -4 
individual -1 generation -1 with a tract length of 500), but 
when examined with respect to the different populations, it 
was determined that the smaller population realized a mean 
interpopulation recombination rate of 4.74 x 10 -4 , 5.2 times 
higher than that realized for the larger population. As a result, 
the smaller population is swamped with DNA from the larger 
population, drifting 33.2 times closer to the large population 
than the large moved toward the small (fig. 3). 

In the simulations described above, sequence similarity 
was calculated over the tract being recombined to assess 
the probability of recombination. For comparison, we also 
ran a simulation in which two identical 22-bp regions at 
the start and end of the recombination tract were required 
for a recombination event to occur (as suggested by 
Majewski and Cohan 1999b), and a simulation in which 
a 44-bp stretch of identity was required at one end of 
the recombination tract (as described in Rao and Radding 
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Fig. 3. — Change in nucleotide divergence between two recombin- 
ing populations that differ in size. Circle area is proportional to effective 
population size with 1800 individuals in the large population and 200 in 
the small population. The simulation was started with n g of 0.04 and \i/p 
of 0.00035. 

1 995; Reddy et al. 1 995). These different approaches to as- 
sessing recombination probability did not significantly alter 
simulation results (supplementary fig. S1 A Supplementary 
Material online). The requirement for short regions of iden- 
tity (MEPS) at the termini of recombination tracts results in 
a relative recombination rate that is indistinguishable from 
the rate obtained when calculating the success of recombina- 
tion as a function of sequence similarity, modified by the dis- 
tance factor of -20, across entire recombination tract 
(supplementary fig SIB, Supplementary Material online). 

Relevance to Microbial Species 

We have examined above the effect of HR on two sympatric 
yet ecologically isolated populations. We can also use this 
framework to make hypotheses about the dynamics of core 
genome evolution for individual microbial species. We as- 
sume that the level of nucleotide divergence for the entire 
species is greater than or equal to the level of interpopula- 
tion nucleotide divergence for any two populations within 
the species (i.e., n g < n). We further accept the simplifying 
assumptions that there is complete mixing of individuals, 
constant population size, and lack of selection among the 
individuals in the species (assumptions common to models 
of population genetics). Starting from these assumptions, 
we can calculate values of jo/p and n from multilocus se- 
quence data collected for strains representing a particular 
microbial lineage and use these parameter estimates to 
make a hypothesis about whether the homologous genes 
in any two populations within the lineage are free to diverge 
through clonal processes or are constrained by the cohesive 
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Fig. 4. — Values of nucleotide diversity, HR rate, and mutation rate 
estimated from strains that belong to a range of microbial species 
plotted in relation to the equilibrium threshold for cohesive recombi- 
nation (values plotted on a log scale to allow all points to be visualized). 
These values are intended as estimates of values that may occur 
between different populations within the named species. The re- 
combination distance factor was varied to demonstrate the impact of 
this value on the threshold of cohesive recombination. The distance 
factors that were used are -20 for the solid line (as presented in fig. 1), 
-1 0 for the dotted line, and -30 for the dashed line. Values for p/p are 
calculated as n (r/rn)" 1 from the values of rim in Vos and Didelot (2009) 
and 7i is calculated from the same data sets provided by Dr Michiel Vos. 
A larger set of values and species names are given in supplementary 
table S1 (Supplementary Material online). The arrow shows the average 
nucleotide divergence between Campylobacter coli and Campylobacter 
jejuni based on MLST loci (Dingle et al. 2005) values for C. coli were 
calculated from Lang et al. (2009) as described in supplementary table 
S1 (Supplementary Material online). 

effects of HR. Obviously, violation of assumptions will impact 
the accuracy of these predictions, but a framework for mak- 
ing such null hypotheses provides a valuable tool for explor- 
ing the evolutionary forces that drive the diversification of 
microbial lineages. 

We examined the biological relevance of our model by 
calculating jo/p and n for a range of bacterial and archaeal 
species (fig. 4 and supplementary table S1, Supplementary 
Material online). For this purpose, we used multilocus se- 
quence data that has been described previously (Vos and Di- 
delot 2009). We also chose to examine the impact that 
varying the distance factor of recombination has on the 
threshold of cohesive recombination (fig. 4). Although 
the distance factor has been estimated at -20 in laboratory 
experiments, the degree to which this term varies across the 
domains of life remains poorly characterized. For example, 
a distance factor between -29 and -44 was estimated for 
recombination among Ferroplasma in acid mine drainage 
(Eppley et al. 2007). Different values for the distance factor 
of recombination may result from unique aspects of the re- 
combination machinery in acidophilic Ferroplasma or due to 
fitness costs associated with insertion of divergent DNA 
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(Eppley et al. 2007). The population parameters for a wide 
range of microorganisms can be seen to span the threshold 
that separates recombining and clonal populations regard- 
less of the distance factor (fig. 4). Our model locates the spe- 
cies Bacillus thuringiensis, Bacillus weihenstephanensis, and 
Bacillus cereus well above the threshold of cohesive recom- 
bination (fig. 4). The model predicts that species with these 
parameters will evolve along clonal trajectories, having sub- 
populations able to diverge on separate evolutionary trajec- 
tories as described by the ecotype model. In contrast, H. 
pylori, N. meningitidis, and 5. pneumoniae, which are 
known to be highly recombining (Feil et al. 2000; Falush 
et al. 2001), fall under the threshold of cohesive recombi- 
nation (It should be noted that although H. pylori has an ex- 
ceptionally high recombination rate, this species also has an 
exceptionally high mutation rate which causes the ratio ja/p 
to be somewhat higher than that observed for species with 
lower rates of recombination [e.g., S. pneumonia]). The 
model predicts that the evolution of species having these 
parameters will be dominated by the cohesive effects of re- 
combination. Although the evolutionary significance of HR 
differs dramatically for the species described above, we can 
see that these differences arise as manifestations of simple 
underlying principles. 

Implications and Further Discussion 

The bactsimDF population simulator was designed to test the 
equations we developed to explain the evolutionary dynamics 
of sympatric microbial populations. A limitation of the model 
is the small size of the populations being modeled. This con- 
straint was imposed by computational limitations in order to 
allow simulations to run in a reasonable period of processor 
time. The value we used for population size was selected to 
be large enough to capture dynamics due to genetic drift and 
yet small enough to allow robust simulation. It should also be 
noted that given the simplifying assumptions used in bact- 
simDF the census population size is equal to effective popu- 
lation size. In real microbial populations, the effective 
population size is expected to be vastly smaller than census 
population size. Previous models for evaluating the evolution 
of microbial population have used effective population sizes 
in the range of 500-2000 (Falush et al. 2006) and as small as 
20 (Vetsigian and Goldenfeld 2005). In our simulation, an in- 
crease in population size by an order of magnitude (to 20,000 
individuals) did not noticeably impact simulation results be- 
yond the expectations of stochastic variation (supplementary 
fig. S1/\, Supplementary Material online). 

The simulations we describe have important implications 
for the ability of ecotype model to explain the divergence of 
sympatric lineages. In the ecotype model, a new ecotype 
forms when the acquisition of an adaptive mutation allows 
a derived population to invade a new ecological niche and 
escape from the effects of drift and selection imposed by the 



ancestral population. We show that the HR rates estimated 
for a variety of microbial species are high enough to prevent 
lineage diversification through the ecotype mechanism (fig. 
4). Populations under the threshold of recombination (fig. 1 ) 
would be sufficiently cohesive such that, in the absence of 
barriers to gene exchange, derived populations would be 
unable to diverge across the homologous loci of the core 
genome. Furthermore, the period during which a nascent 
ecotype is formed from an ancestral population would, 
by necessity, be characterized by inequality in population 
size, and this difference in population size would serve to 
decrease the ability of the derived population to escape 
the impact of HR f rom the ancestral population (as described 
in fig. 3). We would predict that populations that fall below 
the cohesive threshold of recombination would be unlikely 
to diverge as described by the ecotype model. Application of 
the ecotype concept would still have utility for characteriz- 
ing the divergence of populations found above the thresh- 
old line for cohesive recombination, such as those of Bacillus 
species. We might speculate that, in recombining popula- 
tions, niche expanding mutations might be found in the aux- 
iliary genome and their exchange within the population 
governed by propinquity. An example of this may be ob- 
served in Vibrio cholerae in which the genes of the core ge- 
nome are mixing globally within species, but auxiliary genes 
associated with integrons cross species boundaries and 
show strong dependence on the geographic location from 
which the strains were isolated (Boucher et al. 201 1). 

Although we have restricted our analyses to the effects of 
HR, NR also serves as a powerful force of evolutionary 
change. NR contrasts with HR in that the former will lead 
to gene acquisition rather than gene replacement, and as 
a result, we would expect that NR will generally manifest 
as a diversifying rather than as a cohesive force and have 
its greatest impacts on the auxiliary genome. Niche expan- 
sion by the acquisition of adaptive genes can promote diver- 
gence of populations above the threshold of cohesive 
recombination consistent with the ecotype model. Our 
model predicts that the acquisition of niche expanding 
genes would not be expected to promote sympatric diver- 
gence within a cohesively recombining population unless 
somehow the acquisition also created a barrier to gene ex- 
change. In this case, it would be the barrier to gene ex- 
change, rather than escape from periodic selection, that 
would be expected to drive lineage divergence. It is impor- 
tant to consider that NR can also have downstream implica- 
tions for HR rates. NR results in the insertion of new 
stretches of DNA that lower the local sequence similarity 
at the conserved boundary sequences. The local decrease 
in sequence similarity has the effect of lowering local HR 
rates and this might result in propagating fronts of diver- 
gence as described by Vetsigian and Goldenfeld (2005). 
The propagating front hypothesis provides a mechanism 
that could promote divergence in highly recombining 
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populations by creating barriers to gene exchange. Further 
research on the interplay of both types of recombination is 
necessary and in the future it should be possible to modify 
bactsimDF to simulate the effects of both HR and NR on 
the core and auxiliary genome and to evaluate the role of 
propagating fronts in lineage diversification. 

The existence of a cohesive recombination threshold that 
circumscribes clonal and recombining populations has been 
predicted previously, and evidence for this threshold has pre- 
viously been tested in several computer simulations (Majewski 
and Cohan 1999b; Vetsigian and Goldenfeld 2005; Falush 
et al. 2006; Fraser et al. 2007, 2009). These simulations have 
considered several scenarios: one population with or without 
temporary division (Falush et al. 2006; Fraser et al. 2007, 
2009), local genomic effects of NR (Vetsigian and Goldenfeld 
2005), and multiple ecotypes with selective sweeps (Majewski 
and Cohan 1999b). The simulation we describe is different 
because it evaluates how the core genome is expected to 
evolve in sympatric populations under the strong assumption 
of niche isolation between the two populations. In other 
words, we predict the effect of HR on any two populations 
that are unable to drive each other to extinction as a result of 
drift or selection. Our intention is to provide a quantitative 
framework based upon theory that can be used to make pre- 
dictions about microbial populations without the need for 
computationally intensive simulation. For example, a species 
which falls below the curve in figure 4 can encompass mul- 
tiple ecologically distinct populations, but cohesive recombi- 
nation will prevent divergence across the loci of the core 
genome. Examples of species that have ecologically divergent 
populations that are not genetically resolved across the loci of 
the core genome include 5. pneumonia (Hanage et al. 2009) 
and Vibrio cholera (Boucher et al. 2011). As a result, we 
would not expect MLST data to reveal ecological differentia- 
tion within these species. In contrast, MLST data could be suf- 
ficient for defining ecological differentiation for those species, 
such as Bacillus spp., that are not subject to the cohesive ef- 
fects of recombination. 

This theory also provides a measure to assess the comple- 
tion of speciation, allowing one to conclude whether the 
core genomes of two genetically divergent lineages could 
converge due to recombination without being driven by se- 
lection. For example, we can consider the case of Campylo- 
bacter coli and Campylobacter jejuni in which interspecies 
HR is hypothesized to be driving species merger (Sheppard 
etal. 2008). From estimates made using MLST data, it is pos- 
sible to estimate intraspecies HR rates for these species. 
Interpreting these rates in the context of our framework 
(fig. 4) indicates that recombination is unlikely to drive co- 
hesion across the core genome of these species (fig. 4). 
Because sequence similarity between species is lower than 
that within species, we can further predict that although 
interspecies recombination is free to occur, merger across 
the core genome is unlikely between these species. Recent 



genomic analysis of C. jejuni and C. coli suggest that al- 
though interspecies recombination has occurred, the two 
species remain clearly distinct (Caro-Quintero et al. 2009; 
Lefebure et al. 2010). Analysis of 96 genome sequences in- 
dicates that interspecies recombination is relatively rare be- 
tween the core genomes of these species (Lefebure et al. 
201 0). The framework we describe provides a tool for mak- 
ing prediction about whether a given level of recombination 
would be expected to promote lineage cohesion. 

The availability of a null model built on a theoretical foun- 
dation has great utility when investigating bacterial popula- 
tion dynamics as it provides some expectations as to how 
populations should behave under simplifying assumptions. 
The model we describe provides a productive framework for 
generating and testing hypotheses about the evolutionary 
dynamics of specific populations. 



Supplementary Material 

Supplementary figure S1 and table S1 are available at 
Genome Biology and Evolution online (http://www.gbe. 
oxfordjournals.org/). 
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